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Abstract: A systematic method for optimizing multivariate discriminants is developed and 
applied to the important example of a light Higgs boson search at the Tevatron and the LHC. 
The Significance Improvement Characteristic (SIC), defined as the signal efficiency of a cut or 
multivariate discriminant divided by the square root of the background efficiency, is shown to 
be an extremely powerful visualization tool. SIC curves demonstrate numerical instabilities 
in the multivariate discriminants, show convergence as the number of variables is increased, 
and display the sensitivity to the optimal cut values. For our application, we concentrate on 
Higgs boson production in association with a W or Z boson with H — >■ bb and compare to the 
irreducible standard model background, Z/W + bb. We explore thousands of experimentally 
motivated, physically motivated, and unmotivated single variable discriminants. Along with 
the standard kinematic variables, a number of new ones, such as twist, are described which 
should have applicability to many processes. We find that some single variables, such as the 
pull angle, are weak discriminants, but when combined with others they provide important 
marginal improvement. We also find that multiple Higgs boson-candidate mass measures, 
such as from mild and aggressively trimmed jets, when combined may provide additional 
discriminating power. Comparing the significance improvement from our variables to those 
used in recent CDF and D0 searches, we find that a 10-20% improvement in significance 
against Z/W + bb is possible. Our analysis also suggests that the H + W/Z channel with 
H — > bb is also viable at the LHC, without requiring a hard cut on the W/Z transverse 
momentum. 



1. Introduction 

Search strategies for new physics often depend on being able to find a small signal on top of 
a large background. In many cases, there are an enormous number of possible discriminants 
which one would ideally like to combine to maximize search sensitivity. One approach is to 
choose, by some means, a small set of well-understood and fairly uncorrelated variables and 
feed them into a multivariate discriminant such as an Artificial Neural Network (ANN) or 
a Boosted Decision Tree (BDT). This approach has been applied productively in the light 
Higgs boson searches at CDF |l|] and D0 ]2|, [|. While these multivariate techniques can 
often improve discrimination power, it is almost impossible to follow their inner workings in 
full detail. A major concern is that they can pick up on unphysical features of the Monte 
Carlo samples used to train them, rather than real differences in the signal and background 
processes. At the same time, it is also unclear how dependent the absolute performance and 
robustness are on the initial choice of variables. While we generally expect that there are 
important multidimensional correlations that a computer can find better than a person, we 
would like to be able to capitalize on this fact in a more systematic manner. The goal of 
this paper is to develop such a systematic programme, showing how to produce combined 
discriminants that are more trustworthy and better optimized. Kinematic observables based 
on well- understood, hard, perturbative physics become our 4-5 most powerful discriminants. 
Beyond that, it is useful to examine properties of the jets themselves affected by perturbative 
QCD emissions. 

The main example we consider is the production of a Higgs boson at the 14 TeV LHC in 
association with a vector boson {Z or W), with the Higgs decaying to bb. This process was 
studied by both ATLAS and CMS. ATLAS, for example, concluded that the most promising 
channel, WH — > £ubb, would not provide enough significance for discovery [|| . More recently, 
WH and ZH were revived with the observation that putting a hard cut on a single variable 
can significantly enhance signal-to-background ratio |j, In particular, imposing a cut pr > 
200 GeV on the reconstructed Z reduces the signal by factor of 20, but reduces the background 
by a much larger factor of 320. This pt cut (used in conjunction with jet substructure 
methods to optimize mass resolution and background shaping), reinstated WH and ZH as 
possible Higgs discovery modes at the LHC. It is, at the outset, unclear whether a multivariate 
approach would pick up on the fact that a hard cut on pt could make ZH or WH viable. 
It is also unclear whether the hard pt cut is optimal, or whether better use could be made 
of the 95% of ZH signal events which this cut throws out. We seek to optimize multivariate 
searches for the standard model Higgs boson at the Tevatron and LHC. 

With this motivation, our goal is to analyze systematically the entire phase space of ZH 
production with H — > bb to find the kinematic regions that maximize signal significance. We 
will concentrate on separating ZH — > £ + £~bb from its irreducible background in the standard 
model, for example, from pp — > Zbb — > £ + £~bb. Here, £ is an electron or muon. We will 
also require the b's to appear in separate jets. This dijet reconstruction approach continues 
to work well at high Higgs boson pt (up to about 400 GeV). We note that our results are 
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complementary to any further improvements achievable using substructure-based techniques. 
We will only briefly discuss the reducible backgrounds, such as mis-tagged 6-jets. All of these 
other backgrounds account for less than half of the total background at CDF ??. We will 
also show results for the WH channel as compared to its irreducible Ivbb background. For 
both ZH and WH, the reducible backgrounds are clearly important to establish the final 
search reach, but restricting to irreducible backgrounds more concisely illustrates our main 
observations. 

Our first step is to develop and study a very broad range of single- variable discriminants. 
We consider initially thousands of discriminants, including kinematic observables correspond- 
ing to the "hard scattering" (e.g. A-q between the b jets, s, etc.) along with observables that 
distinguish signal from background due to differences in the QCD radiation patterns (e.g. 
color connections, subjet multiplicities, etc.). Some of these variables, such as twist and pull, 
are more generally applicable. We also at varying the jet sizes and jet algorithm, and the 
effect of trimming Many of the variables are very similar, but enough are sufficiently 
independent to be considered separately. 

Once the input variables are cataloged, we establish a criterion to evaluate their useful- 
ness. The ultimate measure is, of course, how much integrated luminosity the collider would 
need to find (or reject) a hypothesized signal to a certain statistical significance, say 5a. Even 
with this criterion, there are multiple measures of significance — should we compare the events 
in a single optimized signal bin to the Monte Carlo prediction for signal and background? 
Should we look for an excess in the signal region compared to the sidebands? Should we fit 
curves and compare the \ 2 f° r various simulated distributions? How do we treat the experi- 
mental systematic uncertainties? In fact, it is nearly impossible to get an accurate measure of 
the final search reach from a theoretical study without collaboration-approval-dependent full 
detector simulation. Nevertheless, the relative importance of different discriminants should 
be roughly independent of the final search strategy. We therefore argue that the Significance 
Improvement Characteristic (SIC), defined as the signal efficiency divided by the square root 
of background efficiency resulting from a cut on a given discriminant, 



is a practical and useful measure. We will show that this quantity, viewed as a function 
of £s allows us to efficiently explore the convergence and limitations of the multivariate 
combinations. One can also use the maximum of SIC, SIC, to rank variables and as a 
quantitative measure of final efficiency. 

This paper is organized as follows: Section ^ describes the selection cuts we use for the 
signal and background samples, and the resulting cross sections. We only consider irreducible 
backgrounds, so the numbers present do not directly translate into a discovery potential. 
In Section ^, we describe our single variable discriminants. This section includes variables 
which are useful at the hard parton level, such as jet pr's, physically motivated variables, 
such as helicity angles, variables dependent on the radiation pattern, such as pull ||, and 



SIC = 
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some useful variables that do not have an obvious physical motivation as well. Section || 
discusses some efficiency measures and motivates the SIC curves. Section |5| gives a brief 
introduction to some of the multivariate measures we consider, and explains some features 
of boosted decision trees (BDTs), our method of choice. Section [6| describes improvements 
when variables are combined and our algorithm for finding the optimal set. We show that 
in the case of associated Higgs boson production the SIC curves for Boosted Decision Trees 
continue to increase in performance until the of 6 th or 7 th variable is added. The addition 
of more variables does not provide additional discrimination — it does not improve the SIC. 
The efficiencies used in these sections are all based on samples constrained to lie within 
a fixed Higgs mass window using a particular jet algorithm. We justify this window and 
algorithm in Section |7[ Section [7| also shows that additional improvement may result from 
combining multiple measures of the bb invariant mass, from different jet algorithms. For 
the final discriminants the mass window is removed and m b i is included in the multivariate 
analysis. Section || shows the final discriminant combinations for the Tevatron and the LHC. 
A summary and discussion is presented in Section ||. 



2. Event Generation 

The bulk of this paper will refer to a reference sample of events generated initially with mad- 



GRAPH V4.4.26 |pi| : pp — > ZH — > £ + £ bb for signal and pp — > Zbb — > £ + £ bb for background 



at \/5=14TeV. These are then showered through pythia V8.140 []l2| . Jets are reconstructed 



using fast jet V2.4.2 [JO]], and these (along with the leptons) serve as our "detector- level" ob- 



jects. The multivariate analysis is done using the tmva v4.0.4 package [16] that comes with 
ROOT v5.27.02 [[IT]]. Our reference Higgs boson mass is ran = 120 GeV throughout: above 
the LEP limit of 115 GeV but below 130 GeV where bb decay no longer dominates. Masses 
below 115 GeV are excluded by LEP, and decay to WW* starts to dominate above around 
135 GeV [14]. We will also consider ZH events at the Tevatron pp with ^/S = 1.96 TeV, 



and WH events at the Tevatron and LHC. The WH events will be compared to their W bb 
irreducible backgrounds. Reducible backgrounds such as Wjj with false 6-tags or ti will not 
be considered, for simplicity. 

Generator-level cuts are described in Table p]. It is important that the cuts applied at 
the hard parton level, in madgraph, not be as tight as the cuts used for the final jets. We 
found that a factor of 2 margin was wide enough while maintaining acceptable generation 
efficiency. We did not apply a cut on m b i in the generated samples. Once this sample is 
generated and showered, we require two 6-tagged jets. Our operative definition of 6-tagging 
matches l?-hadrons from the intermediate event record to final-state jets with ARjb smaller 
than the jet clustering radius. We then cut on the 6-jet pt and rapidity. 

We generated 3 million signal events and 30 million background events. After 6-tagging 
and detector cuts were applied, we were left with around 2M signal and 4M background 
events. Within our fiducial Higgs Mass Window of 90 GeV < m hh - < 124 GeV, we ended 
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hard-parton level cuts 


detector level cuts 


p b T >7 GeV 
p£ > 3 GeV 
p e T > 3 GeV 
% < 5 and % < 5 


p b T > 15 GeV 
p\ > 6 GeV 
pf, > 20 GeV (LHC), 10 GeV (Tevatron) 
% < 2.5 and |% < 2.5 



Table 1: Cuts applied for event generation. 





LHC (14 TeV) 


Tevatron (1.96 TeV) 


Integrated Luminosity, J J*f 


30 fb~ l 


10 fb- 1 




pp — > ZH 


pp —> Zbb 


pp — > ZH 


pp — > Zbb 


a times Branching Ratio 


33.4 fb 


57,200 fb 


3.63 fb 


1250 fb 


After Generator-Level Cuts 


31.5 fb 


26,000 fb 


3.40 fb 


570 fb 


Two b Tags % (of Gen-Level) 


57% 


25% 


81% 


25% 


Higgs Mass Window % (of Gen-Level) 


40% 


4% 


52% 


3% 


Initiated by gg (as opposed to qq) 


0% 


90% 


0% 


27% 


Xsec (in Higgs Mass Window) 


12.3 fb 


1100 fb 


1.8 fb 


14.9 fb 


Events (Xsec x ) 


370 


33,700 


18 


149 


Starting B/S 


91.1 


8.2 


Starting S/yfB 


2.02 


1.47 



Table 2: Cross Sections for LHC and Tevatron signal and background. The lowest 6 rows refer to 
a Higgs mass-window cut, 90 GeV < m hb < 124 GeV, where the mass is computed from the hardest 
two 6-tagged i? = 0.5 anti-fey jets. The significances here will be the baseline references from which we 
compute fractional improvements. This applies even for samples without an m hb cut where different 
m b i measures are used as part of a multivariate discriminant. 



up with 1.5M signal and 0.6M background simulated events. This specific window will be 
justified later as the one that maximizes significance. 

The overall cross sections and efficiencies for these cuts are shown in Table ^. The row 
in this table labeled "Higgs Window %" refers the percent of events with (a window we find 
later to be optimal, see Section [?] below). For this table, jets are found using the anti-fey 
algorithm with R = 0.5 and m bb is the invariant mass of the two 6-tagged jets in the event. 
The "Xsec" row, and the rows below, provide cross sections and a normalization for the initial 
significance. These are all after the m bb mass window cuts, but with no other discriminating 
variable applied. Our improvements will be compared to this significance. We will later treat 
the mass window in a more sophisticated way, combining multiple mass measures and jet 
algorithms. Here find it pedagogically useful to have a standard set of reference efficiencies. 

The row in the table labeled "Initiated by gg", which is also for events within the m bb 
window, shows an important distinction between the Tevatron and the LHC. Note that the 
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signal is all qq initiated at both machines. Since the Zbb background at the LHC is mostly 
gg initiated it will be easier to distinguish from signal than the backgrounds at the Tevatron, 
which are mostly qq- initiated, and therefore more similar to the signal. 



3. Single Variable Discriminants 

In this section we catalog all of the variables we use as potential discriminants between signal 
and background. The language will refer to the ZH — > £ + £~bb process, although the same 
variables can be used for the WH sample with £u replacing £ + £~ . The variables will be split 
roughly into two classes, kinematic and radiation. Kinematic variables are those which 
are meaningful at the hard parton-level, such as m h i. They are expressible in terms of the 
4- momenta of the £ + , £~ , b, and b. Although the kinetic variables can be defined at the 
parton level, we will measure them using the 6-jet momenta. Radiation variables, such as the 
number of charged hadrons in a jet, are those which result mainly from QCD radiation. In 
the Monte Carlo, these variables are populated due to the parton shower. 

To begin, consider how many independent degrees of freedom there are at the hard 
parton level. We will treat showering and initial state radiation later. The underlying hard 
process we are interested in is pp —> ZH — > £ + £~bb. The final state is characterized by 
the 4-momenta of the four final state particles, which is 12 degrees of freedom including the 
mass-shell constraints of the 6's and leptons. One approach to constructing discriminants is 
to simply throw the 12 degrees of freedom, pf,Pu,pf, etc., into a multivariate analysis and 
hope for the best. However, it makes more sense to consider physically motivated variables. 
Azimuthal rotation invariance, and the overall pt = constraint reduce the physical degrees 
of freedom to 9, and Z and H invariant mass constraints reduce the number down to 7. Since 
there are multiple ways physics can motivate the choice of variables, this will result in many 
more than 7 variables. We use the standard coordinate system with ±z pointing in the beam 
direction, y is the rapidity, rj is the pseudorapidity, and <p is the azimuthal angle. 

We first consider variables which are natural from the experimental point of view. These 
include things like transverse momenta, invariant masses, rapidity differences, angular sep- 



arations, etc.. These will be cataloged in Section 3.1. This can be thought of as a type of 
bottom-up parametrization. The alternative is a top-down parametrization, motivated by the 
physical process. One can start with variables that characterize the ZH production, such as 
s and the production polar angle 6*. Then when the Higgs and Z decay, one can think about 
various angles constructed from their decay products in their rest frame. This approach will 



lead to variables discussed in Sections ^3 and pji Combining the bottom-up and top-down 



ways of thinking leads to an even larger set of possible discriminants, discussed in Section 3.4 . 
Sections 3.5 to |3.1C describe some of the showered variables. 



3.1 Lab-frame Kinematical Variables 

First, consider variables which are natural to define and measure in the lab-frame. Using only 
the 4-momenta of the b jets in the lab frame, we have 
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• pt of the two 6's: p!f for the higher-py, and pf for the lower-py one 

• Ar lbb = ~ % 

• A(f> b i = (fib — <fib (properly wrapped to have a range between —it and ir) 
. AR bb = V(AVbb) 2 + (A0 b - b ) 2 

The same variables are also considered for the lepton system, with i + and i~ replacing the bb. 
There, one can also treat the positively and negatively charged leptons separately, although 
we found that sorting by px, as with the 6's, tends to work better. 

There are variables involving the reconstructed Higgs boson with four-momenta p^ = 
p b +Pr, and reconstructed Z, with p% = p^ + + p^_ : 

• p^ and ptp, the px of the Z and Higgs boson. At the parton- level these are the same, 
but they end up slightly different due to jet reconstruction. 

• pj< M ' '■ Center of Mass px, the magnitude of the vectorial sum of the 2 6-jet and 2 lepton 
py's. This is zero for the parton-level process, but non-zero after showering and jet 
reconstruction. pj, M is not included in our analysis, as discussed below in the missing 
Ex section. 

We show a set of these variables in Figure |]. Some of these, such as Ary^, look to provide 
very good discriminants. All of the variables are shown for anti-fey R = 0.5 jets Iterative 
jet algorithms, including anti-fey jets are defined by first assigning all energy depositions into 
their own protojet. At each stage of the clustering, calculate the distance between each pair 
of protojets, defined by 

A?. 

along with the beam distance of each protojet, defined by 

d iB = fe?, (3.2) 

where A|- = (y« — yj) 2 + (cfii — (fij) 2 and kti, y%, and <fii are the transverse momentum, rapidity, 
and azimuth of protojets i and j. The parameter p = 1 for the kx algorithm, p = for 
the Cambridge/ Achten algorithm, and p = —1 for the anti-fey algorithm. When one of the 
distances between protojets is the smallest, those protojets are combined into another protojet 
by adding their 4-vectors. When one of the beam distances is the smallest, that protojet gets 
promoted to a real jet and removed from further consideration. Either way, all distances are 
computed again for the new set of protojets, and the process repeats until all protojets have 
been promoted. 

Some other variables inspired by previous Higgs boson searches include: 

• acoplanarity = \ir — \ A(fi b i\ \ + \tt — T,9 bb \. Also for . 

• = m bb 2 + p 2 +Py- transverse mass of bb system. Also for £ + £~ . 
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|AT|J (radians) AR^ (radians) (GeV) 




p^ 1 (GeV) p^ 2 (GeV) \Atp \ (radians) 



Figure 1: Some lab- frame kinematic variables for ZH signal (solid blue) and Zbb background (hashed 
red) at the LHC. Events satisfy selection cuts and the Higgs mass window cut, 90 GeV < m b i < 
124 GeV. Horizontal axes are in radians or GeV as appropriate, and vertical axes are in arbitrary 
units with signal and background normalized to the same area. 



• m,££b' invariant mass of the Z combined with a 6-jet. The max or min over the two 6's 
can also be used. 

• p »? balance = |pW +£§? + p? T - ft], for the WH search. 
3.2 Twist 

The strength of the bb angle discriminants Arj, A(j> and AR, motivates us to explore the 
bb system more carefully. Figure ^ shows the two-dimensional distribution of parton-level 
events in the (A77, A<p) plane for the signal and background. It is clear from these plots 
that the polar angle, which we call twist, would be a good discriminant. If we think of 
(A77, A0) as 2D Cartesian coordinates, then polar coordinate combinations are the familiar 
AR = \J At/ 2 + Acj) 2 , and the twist angle: 

r = tan -1 — — . (3.3) 
Arj v ; 

Twist is a longitudinal-boost-invariant version of the rotation of the H/b/b plane with respect 
to the beam I H plane. This is illustrated in Figure [|. The twist angle is zero when the particles 
are separated along the cylinder in 77, and ir/2 when separated around the cylinder in <fi. 

The shape of the signal and background twist distributions in Figure |2] can be understood 
as follows. For low Higgs px, Vt ^ m H, the signal lives in bands clustered along \A<ft\ ~ ir, 



Higgs Boson Signal Background Initiated by gg 




Figure 2: Ag b i vs A<fi bb - for the Higgs boson signal (left) and the gg initiated Zbb background dominant 
at the LHC (right). This is at the hard parton level, and for p^ > 50 GeV. The difference is less 
dramatic for lower px or for the Tevatron, where the gg-initiatcd background dominates. Absolute 
values could also be taken for b indistinguishable from b. 



Signal-Like Twist t = tt/2 Background-Like Twist r = 




Figure 3: Twist angle r in 3D with the b and b emerging from the interaction point. The twist angle 
is defined to be boost invariant and does not exactly correspond to the physical rotation angle of a 
plane. The case shown, however, has no longitudinal boost. 

and hence r ~ tt/2. For px > ran, the signal lives in a ring of AR ~ 2itih/pt, and twist 
is a powerful variable orthogonal to the radial direction. For higher px, the signal becomes 
somewhat more uniformly distributed in twist, but still retains its r = tt/2 preference. This is 
all purely a consequence of the spherically-symmetric Higgs boson decay boosted transverse 
to the beam. In contrast, for the background, in particular the gg-initiated background, there 
is a bias for one of the 6's to have large rapidity, which leads to low twist. This stems from 
t-channel type singularities in the Zbb production matrix elements (in the limit of massless 
6's.) At higher px, the background retains much of its r = preference. The distributions in 
Figure § reflect an admixture of the high and low pt twist behaviors, biased towards the low 
Pt behavior, which is where most of the events lie. 

We show in Figure || the twist distributions for signal and background for the bb system 
and the £ + £~ system. The top panels show the twist distributions at the MADGRAPH level 
before showering and cuts, and the bottom panels after jets are reconstructed and detector 
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Madgraph hard parton level with no cuts 




'o n/4 ' n/2 °o n/4 n/2 

T bB twist angle (radians) t rr twist angle (radians) 



Showered and jetted, with detector cuts 




x bB twist angle (radians) % n twist angle (radians) 

Figure 4: Twist angle distributions for ZH signal (solid blue) and Zbb background (hashed red), for 
the LHC with no cut. Madgraph hard parton-level with no cuts (top) and showered jet-level with 
detector cuts (bottom). Both are shown only in the Higgs mass-window, 90 GeV < m b i < 124 GeV. 
Vertical axes are in arbitrary units with signal and background normalized to the same area. 

level cuts have been been applied. We can see that the discriminating power of twist for 
the bb system is reduced by requiring relatively central jets with a minimal pt- However, 
the background's bias towards the beam is still evident, and twist still provides a useful 
discriminant which we will incorporate into our multivariate analysis. Because of its physical 
motivation, we also suspect the twist angle could have much wider applicability than to the 
ZH and WH searches we consider here. 

3.3 Helicity and Azilicity Angles 

Next, we begin to consider variables motivated by the on-shell decays of the Higgs and the 
Z. In their rest frames, each decay is parametrized only by two angles 9 and 4>. Since the 
Higgs boson is a scalar, its decay products are spherically symmetric and therefore distributed 
uniformly in (ft and cos 9. Specifically, in the rest frame of the scalar Higgs, the b and b quarks 
travel in opposite directions with a fixed energy (m#/2, up to b mass corrections). The rest 
frame of any fake Higgs boson formed by two 6-jets will also have two oppositely-going 6's in 
the bb center-of-mass frame. If we impose that m b i be close to the Higgs mass, in this frame, 
the background 6's will have energies close to m#/2 just like the signal, but they will not be 
distributed in a spherically symmetric way. 
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H boost direction 




beam 



Z direction 



Figure 5: Hclicity angle Oh and azilicity angle 4> a for H 
leptons on the Z — > £ + £~ side of the event. 



bb. Angles can also be defined for the 



Parameterizing the H decay with two polar angles requires a convention for the axes. 
Our convention is motivated by the helicity angle used in W and top studies IE]. The angles 
are constructed in the rest frame of the Higgs boson, and refer to directions defined by the 
Z and beam 3-momenta observed in this frame. We choose "latitude" to be measured by 
the helicity angle, 9, with the south pole (9 = ir) defined to be the direction aligned with 
the Z (equivalently, the direction along which the ZH parent CM system moves). Given 
this angle, we still have the freedom to chose any direction perpendicular to the axis as the 
(j) = direction. We chose <p = to be pointing toward the beam which was moving along 
+z in lab frame coordinates. The QCD background will favor (j) = and (j) = it. With this 
convention, we call longitude the azilicity angle, since it is the azimuthal <j> angle on the 
sphere whose north pole is defined when the helicity angle vanishes. Azilicity is equivalently 
the angle between the Higgs boson decay plane and the ZH production plane (constructed 
with reference to the beams) as viewed in the event's center of momentum frame. Since b 
and b are indistinguishable, this angle can be chosen to go between and tt/2. A cartoon of 
these angles is shown in Figure |5|. 

The helicity and azilicity angles offer the promise of very strong discrimination power, 
because they are directly tied to physical features of the signal. Indeed, in the top row of 
Figure |6|, we can see that, at the partondevel with no cuts, there are strong singularities at 
both 6 = and = for the background, while the signal distributions are flat, as expected. 
Detector observability cuts remove the most singular background contributions, but still lead 
to distributions that show some remaining discriminating power. 



3.4 Kinematic variable construction 

We have described a number of variables constructed out of the 4-momenta of the jets and 
leptons in various frames. There are large combination of variables which can be formed from 
the measurable kinematic variables. However, using physically motivated variables can help 



- 10 - 





Figure 6: Helicity angle and azilicity angle cf> for the b in the Higgs boson rest frame, for ZH 
signal (solid blue) and Zbb background (hashed red) at the LHC. Madgraph hard parton- level with 
no cuts (top) and showered jet-level with detector cuts (bottom). Both are shown only in the Higgs 
mass- window, 90 GeV < m b i < 124 GeV. 



the automated process in the right direction. By searching for useful combinations up front, 
a neural network, for example, does not have to "discover" how to take an invariant mass or 
boost to the Higgs boson rest frame. 

Thus, we now consider various unintuitive combinations of variables. The procedure is: 

• Pick a particle: high-py b-jet, low-pT b-jet, high-py lepton, low-pr lepton, Higgs, Z. 

• Optionally transform to a boosted frame: Higgs, Z, System Center of Mass (CM). 

• Optionally rotate the polar axis to point along the initial direction of the particle whose 
frame you are in (as for helicity and azilicity angles). 

• Pick a kinematic property: pr, r], (ft, cos(0), etc.. 

• Optionally pick a second particle to form a sum or difference, sometimes with a coor- 
dinate transformation as in AR and twist r, and sometimes with a more complicated 
combination as in invariant-mass. 

• For vector quantities, optionally take the magnitude of vector sums, \p± ±p2| or scalar 
sums, \pi\ ± \p2\- 
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Some stranger kinematic variables that prove to be useful in the multi-variable analysis 
include: 

• Spy = \p T 1 \ + \Pt\'- Sum of magnitudes of p-r's of the two b-jets. 

• £p!j!^ 2 = \p T x 1 + \f>T I : Sum of magnitudes of px's of the higher-p^ 6-jet and the lower-p^ 
lepton. 

• Ap^ ei = \Pt\ — \Pt\- Difference in magnitude of pr between Z and higher-p^ lepton. 

- \pt\: Difference in magnitude of p T between higher-p T 6-jet and 

lower-/>T lepton. 

• Arjbi/2'- Difference in r\ between the higher-p^ 6-jet and the lower-pr lepton. 

• AyH,bi and Ayn,b2- Difference in rapidity between H and higher-p^ or lower-p^ 6-jet. 

• cos(0^ 2 ) : Center of Mass frame cos(#) of the lower-pr 6-jet. Same for higher-p^ 6-jet. 
We show the distributions for a number of these Menu-Method variables in Figure 0. 

3.5 Radiation Variables 

The above variables are constructed out of the 6-jet momenta and the lepton momenta. These 
are what we have been calling the kinematic variables. In addition, there are what we call the 
radiation variables, which are dependent on the radiation pattern of the event. The radiation 
variables generally have fixed or meaningless values at the hard parton level, so they are 
almost entirely complementary to the hard variables. Some examples include: 

• Mass of each 6-jet and the jet mass-to-pr ratio, where the jet's 4-vector is the sum of 
its components' 4- vectors (the "E-Scheme"). 

• Rapidity y in addition to pseudorapidity r\ of each massive 6-jet. 

• Subjet multiplicity for each 6-jet, with different subjet algorithms and sizes. 

• Average pt of the small subjets within each 6-jet. 

• The pt of hardest, 2 nd hardest, and 3 rd hardest subjets within each 6-jet. 

• Radial moments ("girth") of each 6-jet (see Section [O] below). 

• Jet Angularities (see Section [D| below) . 



• Planar Flow of each 6-jet. This was not found to be useful. See [2C] for the definition. 

• Pull of each 6-jet (see Section |37?] below). 
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Ep^ b1 (GeV) Ep^ b2 (GeV) Ay^ (radians) 



Figure 7: A selection of various variable distributions for ZH signal (solid blue) and Zbb background 
(hashed red) at the LHC. Events satisfy selection cuts and the Higgs mass- window cut, 90 GeV < 
m b i < 124 GeV. Horizontal axes are in radians or GeV as appropriate, and vertical axes are in 
arbitrary units with signal and background normalized to the same area. 

• Extra jets in the event. Extra jets were not found useful. However, we have not 
attempted the analysis with a proper matched sample, so we cannot make a strong 
statement about extra jets. We suspect that this is an important issue for removing the 
ti contamination, but not so much for the irreducible backgrounds we consider here. 

Some of the more powerful of these variables are shown in Figure |8[ 
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stddev p T of bi'santi-k T o.i subjets over the jet's p T highest p t anti-k T 0.1 subjet over the jet's p T highest p T anti-k T 0.1 subjet over the jet's p T 




p T of 2nd highest p T anti-k T 0.02 subjet of b1 highest p T anti-k T 0.1 subjet over the jet's p T 3rd highest p T anti-k T 0.2 subjet over the jet's p T 



Figure 8: Some subjet variable distributions for ZH signal (solid blue) and Zbb background (hashed 
red) athe LHC. Events satisfy selection cuts and the Higgs mass- window cut, 90 GeV < m b i < 124 GeV. 
Horizontal axes are in radians or GeV as appropriate, and vertical axes are in arbitrary units with 
signal and background normalized to the same area. 



3.6 Radial Moments: Girth and Jet Angularities 




girth of b2 angularity of b1 with a=-0.90 angularity of b1 with a=-0.10 



Figure 9: Girth and angularity distributions for ZH signal (solid blue) and Zbb background (hashed 
red) at the LHC. Events satisfy selection cuts and the Higgs mass-window cut, 90 GeV < m b i < 
124 GeV. Horizontal axes are in radians or GeV as appropriate, and vertical axes are in arbitrary 
units with signal and background normalized to the same area. 



The distribution of particles within a jet can be can be useful for distinguishing jets 
initiated by different flavors of quark or by a gluon. Even in our signal and irreducible 
background with two 6-tagged jets, this distribution has proven useful. 

One infrared safe way of characterizing the jets is to integrate (sum) the energy or 
distribution against a radially symmetric profile. Different choices of profile and overall nor- 
malization lead to different observables. Distances rj of each particle or cell are calculated in 
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(y, 4>) space with respect the location of the jet. The jet location (y, 0) is defined by the anti- 
algorithm 'E-scheme' as the 4-vector sum of all inputs (particles or calorimeter towers.) 
It is important to use rapidity (rather than pseudorapidity) for the jet location because the 
jet is massive in this scheme. A radial moment sums these distances (or a function of these 
distances), weighted by a quantity like pt, then normalized to the total pt of the jet. For 
example, the linear radial moment, girth, is defined as || 

Girth : 9=^2^- (3.4) 
iejet Pt 

The girth distribution is shown in Figure J9| 

Jet Angularities are also radial moments, but their "radial distances" are rescaled into 



the angular coordinates appropriate for e + e event shapes. They are defined by [20] 



Jet Angularities : A a = — V E { /„(^) , (3.5) 

rrij e t 2R 

J iGjet 

with 

f a (0) = sin°0 (l-costf) 1 " , (3.6) 

with a < 2. The kernel function f a (9) is inspired by full event-shape angularities pi]] , but 
modified so that the edge of a jet at \ri \ =R is mapped to ir/2. Profiles for different choices of 
the a parameter are shown in Figure 0. Note that the energies Ei are used in the definition, 
instead of pr's, and the angularities are normalized by the jet mass. Radial moments like jet 
angularities and girth are especially interesting because it may be possible to calculate them 



accurately in QCD, see for example [22]. 
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Figure 11: Color connections for a signal- like (pp — >• H — s- bb) on the left and background-like 
(pp — > bb) on the right. Our signal and background each have a colorless Z or W (not shown) 
radiating from of one of the hard quark lines and decaying to leptons. This doesn't affect the color 
flow. 



3.7 Pull 

Pull tries to capture the difference in color structure between the Higgs boson signal and the 
QCD background. It was introduced in Ref. ||, and then immediately used in the D0 search 
[ffifl for ZH with Z — > vv. 

To leading order in the number of colors (up to l/iV^iors ~ 10% corrections), quarks 
can be described as being "color-connected" to other quarks by a "color string" Ref. flcfl . 
This approximation governs much of the parton shower. The color-singlet Higgs boson de- 
cays to two 6-quarks that are color-connected to each other, while 6's from the background 
are color-connected to the proton remnants that travel down the beam pipe. This is shown 



schematically in Figure 11. This difference is independent of the event kinematics, and there- 
fore, if observable, should be be complementary to kinematical variables and useful in a 
multi-variable search. 

The pull vector is designed to measure color flow. It is a px weighted moment vector 
that tends to point toward the color-connected partner of the jet's initiating quark. The pull 
vector is defined as 

Pull Vector t= ^ PT j^ n where n = (y t - y jet , fa - </> jet ) . (3.7) 
iejet Pt 

Without the factor of \rt\, this would be the jet's pr-weighted centroid. For fixed kinematics, 
pull has been shown to help separate signal from background Q. For example, if we fix the 
Vs to have (An, A<f>) = (1,2) with pff = 200 GeV, the average pt measured in the calorimeter 
is shown for signal and background in Figure |l^. The difference in pj- distributions around 
the jets holds up even for individual events. The most effective way to use the pull vector is 
to calculate a pull angle, which is the angle between the pull vector and some other vector 
in the event. We will now point out some technical details that help make pull more effective, 



and define some example pull angles, such as the ones drawn in Figure 12 which are defined 
in Table |. 

We found it important to use anti-^T jets because the highest energy jets in the event 
tend to be circular, whereas the split-merge procedure of a cone-jet algorithm can nibble away 
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Signal Accumulated p t Background Accumulated p t 




Figure 12: A single parton-level signal event showered millions of times (left) and a single parton- 
level background event with identical kinematics but different color connections showered millions of 
times (right). The color shows the average showered pt density in (rj,<p) for an ensemble of events 
with fixed parton-level kinematics. (Contours are stepped in factors of two.) Results for ZH — > Zbb 
are displayed on the left, and gg — > Zbb on the right. The underlying color connections are shown with 
thin lines, and examples of pull vectors (which would be defined event- by- event) with arrows. The 
angles and f3\^ are illustrated as the angles of the pull vectors with respect to the "signal-like" 
and "background-like" color connection lines. 



Signal 
pull angles 


a\ = angle between the 
high-pr jet's pull and 
the direction to the low 
Pt jet 


ct2 = angle between the 
low-pr jet's pull and the 
direction to the high pt 
jet 


Signal 

pull distance 
a = \J a\ + «2 


Background 
pull angles 


(3\ = angle between the 
high-p-r jet's pull and 
the direction to nearest 
beam 


@2 = angle between the 
high-pr jet's pull and 
the direction to nearest 
beam 


Background 
pull distance 

(3 = Vff + Pi 



Table 3: Definition of useful pull angles.. 



pieces and combine them with an adjacent jet. If that adjacent jet came from a perturbative 
QCD radiation (due to the color-connection in the dipole picture), these are exactly the pieces 
that should be contributing most to the pull. It's also important that real rapidity y be used, 
as opposed to pseudo-rapidity r/. While this is less important for each calorimeter cell, the 
4-momenta of a jet can be massive when constructed by adding together massless calorimeter 
4-momenta (the "E-Scheme"). If anti-/cT jets are not available, the next best option is to use 
a circular area in the y/<ft plane around the jet's y/cft centroid. 

The 2D distribution of the y and (ft components of the pull vector looks like a Gaussian 
whose peak is shifted slightly away from the origin in the direction of the color-connected 



-17- 



pull of high-p T b jet: ct\ pull of low-p^ ^ jet: CX2 pull signal-distance: a 




Figure 13: Pull distributions for ZH signal (solid blue) and Zbb background (hashed red) at the LHC. 
Events satisfy selection cuts and the Higgs mass-window cut, 90GeV < m b i < 124 GeV. Horizontal 
axes are in radians, and vertical axes are in arbitrary units with signal and background normalized to 
the same area. 



object. The magnitude of the vector does not have as much distinguishing power as its angle. 
The pull angle of each jet, however, does not have much power without comparing it to where 
it "should" point: toward the other jet for the signal, and toward one of the beams (usually 
the closest) for the background. 

The direction toward the other fo-jet is the twist angle, denned in Section |3.2| . A version 
of twist that goes from to 2ir is defined as the direction of the lower-p^ jet from the location 
of the higher-pT jet in the y/<j> plane. The 3D distribution of the pull angles of each jet along 
with this twist angle contains all of the useful pull information that can be used to separate 
signal from background, but the individual pull angles for a given event are meaningful only 
with respect to the twist angle. In an attempt to expose the physics and make the job of 
a multivariate discriminator easier, we begin by denning four variables that more directly 
capture where the pulls "should" point for the signal and background, labeled in Figure [l^ 
and defined Table || 

Subtracting the pull vectors is not useful, since the magnitude of each vector does not 
carry much meaning, and the magnitudes are independent of each other. Many other attempts 
to combine the pull angles into a single variable also sacrifice discrimination power. The best 
we have achieved is the "pull distance," the square root of the sum of squares of the difference 
angles. These pull angles are defined in Table ||[ 
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3.8 Total Energy Variables 

Next, we consider general purpose variables which look at the event as a whole. Examples 
include 

• s = CM energy for hard collision, or invariant mass of the reconstructed Higgs boson 
and Z. 

• Ht = Scalar sum of all Ej>. 

• H z = Boost of the center-of-mass system along the beam. 

• = Scalar sum of all pt (which differs from Ht for massive jets). 

• E v i s = Scalar sum of all visible energy 

• Centrality = S^t / E v i s . 

For each of these variables, the quantity can be constructed from the complete event, summing 
over all particles in the event. Particles here can mean topo-clusters or calorimeter cells or 
energy deposits coarse-grained into 0.1 x 0.1 cells in (rj,(j)). In this study we use the 4- momenta 
of the stable particles in the event record for what we call the particle variables. The same 
variables can also be constructed using just the reconstructed objects (jets, leptons and 
photons) or even just the four primary objects (2 6-jets and 2 leptons). We find that using 
objects generally works much better than using variables constructed from energy in the 
complete event. A particularly useful variable is centrality constructed from the four primary 
objects. Some energy variables are shown in Figure HJ. 

3.9 Event Shape Variables 

Finally, we look at some event shape variables. The ones we consider involve eigenvalues of 
two similar tensors composed of 3- momenta, with the sum over the same sets of particles, 
reconstructed objects, or four primary objects, as above: 



Sphericity Tensor = Spherocity Tensor 



:\2 



PxPx PxPy PxPz . ( 
PyPx PyPy PyPz \ ^Tr, Ei 



1 

\Pi 



'PxPx PxPy PxPz 
PyPx PyPy PyPz 
\PzPx PzPy PzP Z/ 



(3.8) 



\PzPx PzPy PzPz f 

where i labels on the momentum components appearing in the matrix are implicit. 

The eigenvalues of these matrices are computed, then ordered and normalized Ai > A2 > 
A3 with Ai + A2 + A3 = 1. The event shapes are then defined as 



Sphericity and Spherocity: S = | (A2 + A3) where < S < 1. A 2-jet event has S 
while an isotropic one has S ~ 1. 

Aplanarity and Aplanority: A = | A3 where 0<j4<1/2. A planar event has A ~ 
while an isotropic one has A ~ 1/2. 
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Figure 14: Total energy and event shape variables for ZH signal (solid blue) and Zbb background 
(hashed red) at the LHC. Events satisfy selection cuts and the Higgs mass- window cut, 90 GeV < 
m b i < 124 GeV. Horizontal axes are in GeV where appropriate, and vertical axes are in arbitrary 
units with signal and background normalized to the same area. 



• Y variable from Sphericity: Y = ^ (^2 — ^3)- 

• DShape from Spherocity: D = 27A1A2A3. 

Sphericity, Spherocity, and DShape are all highly correlated, Aplanarity and Aplanority are 
too, as are the shapes derived from the four primary objects in the lab and CM frame. 
Aplanarity seems to be the most useful shape variables. The others seem useful only to the 
extent they are correlated with Aplanarity or Centrality. Some event shapes are shown in 



Figure 14. We also consider the Fox- Wolfram moments of particles, objects, and primary 



objects. These are defined by projecting against the Legendre polynomials 

Hi = -=—'E it j\pi\\pj\Pi(cos9ij) . (3.9) 
The Fox- Wolfram moments were not found particularly useful. 
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3.10 Missing Esp variables 

Missing Et and missing px are not included in this analysis, other than the neutrino used to 
reconstruct the W in the WH channel. From an experimental point of view, missing Ep is an 
extremely important variable. However, its distribution is dominated strongly by experimen- 
tal mis-measurements and calorimeter resolution. Without a full detector simulation, missing 
energy is inappropriate for our particle-level study and, in fact, causes unphysical instabilities 
if included in the multi- variable analysis. 



4. Efficiency measures 

Having cataloged our input variables, we can now explore which ones have the best dis- 
tinguishing power. For single variables, we look at the effect that a simple cut or window 
(two-sided cut) has on the number of signal and background events in our fiducial m b i win- 
dow. To combine variables, we will use more sophisticated methods, described in the next 
section. But for single variables, cuts are as good as one can do. 

Any given cut will keep some fraction £5 of signal events and some other fraction eb of 
background events. These are the signal and background efficiencies of the cut. For the two 
sided cuts, there are a number of choices which give the same es, so the one which minimizes 
eb for a given £5 is chosen. A standard way to visualize the relationship between £5 and eb 
is with a "Receiver Operating Characteristic" curve, or ROC curve. Often the background 
rejection 1 — eb, or the inverse rejection is plotted against the signal efficiency £5. A 



sample ROC curve for 10 representative single variables is shown in Figure 15. Lower lines 
are better, since they give more background rejection for the same signal efficiency, but since 
some of the curves cross, the ROC curves do not lead to an immediate observation of which 
variables are better. One approach to ranking variables would be to arbitrarily demand a 
particular signal efficiency and find the variables that give the best background rejection, but 
we propose a more useful procedure that does not introduce an arbitrary parameter. 

In order to quantify the usefulness of a variable, we need to consider the goal of the signal 
search. The signal-to-background ratio (S/B) and the significance (a ~ S/yB) are the two 
quantities considered when trying to see a signal over a large background. Making a tighter 
cut will reduce both efficiencies, but not necessarily by the same amount, so S/B may change. 
The factor by which S/B changes is just the ratio of the efficiencies. 

S cut £sS _ ( Es\ S 



B £sB \£_b / B 

The other quantity normally considered is the statistical significance a which, for large num- 
bers of events, approaches S/ y/~B. For a convincing discovery, the expected number of signal 
events must exceed the statistical fluctuations of the background: 

S >1. (4.2) 



B 
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LHC ZH 



ROC Curve : eb vs es 




Figure 15: ROC curves for some variables, showing the background efficiency and signal efficiency 
as parametric functions of the cut value. 



When we cut on our discriminant, the significance changes by 

It follows that the two quantities we are interested in are the signal-over-background improve- 
ment characteristic 

- , (4-4) 
and the Significance Improvement Characteristic (SIC) 

SIC = 41=. (4.5) 



These quantities tell us the improvement on S/B and a that our discriminant will give. 
When systematics dominate, es/eb is more important, and when statistical errors dominate, 
SIC is the more useful measure. Given that es/eb and SIC are luminosity independent, they 
provide good measures of the relative discrimination power of the various variables. Moreover, 
plotting these measures, especially SIC as functions of £5 provides a wonderful visualization 
of a variable's effectiveness. 



We show es/eb and SIC as a function of £5 in Figure 16 for the same 10 variables as in the 
ROC curve in Figure [H^. There are a few salient features of these more powerful visualizations. 
First, consider the S/B curves. There are apparently two classes of variables. For one class 
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Figure 16: Improvements in S/B and a = S/yB for the same variables as in the previous figure. 



(including Ar/ fe g and other angle-type variables), the S/B is essentially flat below eg ~ 0.7 or 
so. This means that the variable cannot distinguish signal from background beyond this point, 
and cutting harder on these variable is equivalent to throwing out random events. The other 
class (including p^, and other variables with long tails) seem to lead to S/B enhancements 
which grow indefinitely towards £s = 0. This means that if sufficiently hard cuts are placed, 
a very large improvement in S/B can be achieved. This often happens because there is some 
limit of the variable in which there are zero background events. These qualitative observations 
notwithstanding, it is not at all clear, by looking at the £s/^B curves alone, which variable 
does best, or where to apply the cut. For this reason, we will focus less on these curves for 
the multivariate analysis than on SIC. 

The SIC curves provide a much better way of looking at the information than the £s/^B 
curves. We see that there are still essentially two classes. For one class (including Ar/^), 
the efficiencies have a maximum at some intermediate value of £5. Thus, there is an optimal 
value of the cut for these variables to get the maximal enhancement in S/yB. Because of 
this maximum, we argue that 

SlC = max -4L (4.6) 



is a useful characterization of the effectiveness of a discrimination method. The other class 
of variables (including pj,) seem to only be able to reduce S/VB and have SIC = 1. 



The variables which can lead to S/yB improvement are often ones which had flat S/B 
distributions. Also, the variables which lead to arbitrarily large S/B enhancements often can 
only reduce S/y/B. Since p^ is one of the second class of variables, the original efficiencies 
from the boosted Higgs boson search Q were £5 = 1/20 and £_b = 1/320. We see that 
£s/zb = 16 while £s/y^B = 0.89. 

Starting from these observations, we turn to the multivariate analysis. Among other 
things, we will find that some variables which are useless as single variables (i.e. SIC = 1) 
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start to add marginal efficiency on top of the ones which are useful to begin with. Moreover, 
we will see that the optimal set of 10 variables is very different from the top 10 single variables. 
This makes perfect sense: less useful variables can become useful after cuts on more powerful 
variables are applied, because they have subtle correlations. However, this means we have 
to carefully decide on which variables to use, in order avoid throwing out some potentially 
useful ones which are not useful by themselves. A systematic procedure for selecting a set 
of variables is described in Section ^| First, we give a brief introduction to the multivariate 
methods. 

5. Multivariate Techniques 

In order to make use of all possible discriminating features between signal and background 
samples, including complex non-linear correlations, we will make use of a multivariate ap- 
proach. Many multivariate techniques are efficiently implemented in the Toolkit for 
Multivariate Data Analysis with ROOT (TMVA) in a way that makes them easy 
to use and compare. We refer the reader to the TMVA documentation for more details of the 
methods [p^| . 

In this study, we will use mainly Boosted Decision Trees (BDTs) (25|. We briefly consid- 
ered other methods, such as multilayer perceptron Artificial Neural Networks (ANNs). We 
found that BDTs tend to converge faster and run in shorter time than ANN while giving 
similar results. We suspect that ANNs may be optimal for other applications, such as ar- 
tificial intelligence, but for high energy physics analyses, they are not entirely appropriate. 
The main feature that distinguishes the particle physics applications is that we are almost 
always interested in a binary classification: signal vs background. Neural networks seem bet- 
ter suited for multidimensional outputs, such as in pattern recognition tasks. (For a more 
detailed comparison in the context of a particle physics application, with similar conclusions, 
see |2^, |27j.) We also found that a traditional Bayesian likelihood analysis or optimal linear 
Fisher discriminant is comparable to the BDT for few variables, but does significantly worse 
when multiple variables are combined. Performing a thorough comparison of methods in the 
context of collider physics is beyond the scope of the current study. The results in this paper 
will all refer to discrimination using the BDT method, which we found optimal in our informal 
survey of different methods and different parameters. 

A decision tree is a hierarchical set of one-sided cuts used to classify signal (S) versus 
background (B). For example, if there are two discriminants a and b, a tree might be: if a > 2 
then {if b < 4 then S} else {if b > 5 then S} else B. One can, in principle, find the single best 
tree for discriminating S from B within a given Monte Carlo sample of training events, and 
given certain constraints on how many cuts we are allowed to apply. Once this tree is found, 
one can then attempt to train another tree to classify correctly the events that the first tree 
misclassified (i.e. background events that end up in regions designated as "signal- like" by the 
tree, and vice- versa). Then one can train a third tree to attempt to correct misclassifications 
of the second tree, and so on. Typically, this is repeated until a "forest" of 0(1000) trees has 
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been constructed. Each successive tree is trained on the same Monte Carlo sample, but at 
each stage a boost is applied: misclassified events from the previous tree are increased in 
weight, so that the next tree will work harder to better classify them. After building up the 
forest, a weighted vote is taken between all of the trees to form the final discriminant at each 
point in the multivariate phase space. By varying a cut on the weighted vote, i.e. asking that 
at least x% of the trees classify an event as signal for it to pass, the BDT provides a nearly 
continuous efficiency measure parametrized by x. Thus, varying x can generate the ROC and 
SIC curves for our analysis. 

For a single variable, a single tree is sufficient. If the variable is monotonic in signal and 
background, a one-sided cut can be proven optimal. If the distributions rise and then fall, a 
two-sided cut is optimal. For more than one variable, the optimal solution can be calculated 
exactly, using a Bayesian likelihood. However, to use the Bayesian approach one needs to 
know the distributions essentially analytically. This is impossible when many variables are 
involved, and one can only sample phase space. In this case, the best one can do is to know 
the differential cross sections for events in the vicinity of the candidate event's kinematics. 
For such large under-sampled phase space, the Bayesian likelihood approach is inefficient and 
multivariate techniques like BDTs are needed. 

As an example, consider the signal and background distributions for the two variables 
A9bi,n and Ar/^i, as shown in Figure [l?]. These two variables are chosen simply because 
they have unusual correlations. With our large event samples, we can produce an essentially 
smooth 2-dimensional distribution in these variables. Each axis in these figures is sampled 
into 50 divisions, leading to 2500 bins. From these distributions, we compute the "exact" 2D 
probability density, as shown in the third panel of this figure. For two variables, the phase 
space is sampled finely enough that this full likelihood discriminant is computable. Next, we 
ask how well a BDT classifier can reproduce this likelihood distribution. In Figure 18, we 
present the result of a BDT using 2, 8 and 256 trees trained on the same 2-dimensional data 
set. Even with their rectangular cuts, we see that the BDTs do a good job characterizing 
the correlations of the the full 2D probability density. Certainly 8 and even 256 trees require 
many fewer events to train than are needed to sufficiently populate the 2500 bins of the 
sampled likelihood. For higher dimensional phase space, we find that a reasonable number of 
trees continues to sample the space well, while a uniform sampling required for a likelihood 
approach is impossible. We use up to 3000 trees for 10 variables, although our results barely 
change beyond 400 trees. 

Even within the BDT method, there are many different ways to construct classifiers. For 
example, one can train successive trees on misclassified trees from previous runs, as described 
above. Alternatively, one could train trees on a random subset of events (the Random Forest 
approach). For each tree, one can limit the number of branches or the depth of the tree. And 
of course the way the subset of relevant input variables for each tree are chosen is critically 
important as well. We have not attempted to systematically optimize the BDT method in 
this paper. 
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Figure 17: 2D histograms for a particular pair of variables (event counts in each of 50 x 50 bins). 
After the histograms are normalized to equal area, the counts for each corresponding bin of the signal 
and background histograms can be combined into a likelihood estimate where L = s/(s + b). (For a 
different relative normalization, the shapes of constant probability contours would be identical, but 
the values would change in a monotonic way. However, the contour for the probability that yields a 
particular signal efficiency does not change, nor does the ROC curve.) 




Figure 18: Boosted Decision Tree (BDT) approximations of the above "exact" likelihood estimate for 
2, 8, and 256 trees. For regions of no signal and no background events, we defined the probability as 
50% whereas the BDT's rectangles tend to take the score of the nearest event. Note that if appropriate 
absolute values were taken, this particular pair of variables would be linearly correlated and a suitable 
linear transformation could be found to decorrelate them and make the job of the BDT's rectangular 
cuts easier. Taking absolute values of all symmetric distributions might hide important correlations 
with additional variables. 

6. Combining variables 

Having discussed the variables and the boosted decision tree approach to classification, we 
can now attempt to construct strong discriminants. We start by reducing our thousands of 
variables to approximately 800 reasonably different ones. We then calculate the significance 
improvement, and sort the variables by their SIC = max(es/ \/^b) values. The top 10 single 
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Figure 19: SIC curves for the top 10 individual variables. 



Figure 20: Linear correlation coefficients for top 10 individual variables for both signal (left) and 
background (right). 



variables ranked this way are 

A m , Zp$, Ay H>b2 , Ay Htbl , p b T \ H T , cos(^), \pf\ + \pf\, \pf\, \pf\ + \pf\ 

(6.1) 



The corresponding significance improvement curves are shown in Figure 1£. This is not the 
most varied possible set. In particular, none of the radiation variables made the list. More 
unusual variables will resurface when many variables are combined. 

To start combining variables, we first look at the linear correlations among some the top 
variables. These are shown for signal and background in Figure These numbers should be 
interpreted cautiously. Sometimes variables are highly correlated non-linearly, but may have 
low linear correlation coefficients. Also, combining uncorrelated variables often does not help 
improve the S/^/B at all, whereas combining correlated variables often does. Nevertheless, we 
find the linear correlations a useful way to see which variables are measuring similar things. 
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Figure 21: SIC curves for the top 10 pairs of variables using Boosted Decision Trees 



For example, this matrix shows that since Ay#&2 an d Ar/^ are 96% correlated, they have 
almost exactly the same information, as expected. We will also see that the final set of 
variables we choose algorithmically is not nearly as correlated as the top 10 (see Figure |2~3| ). 
The SIC for each pair among these 10 kinematic variables is shown in the matrix in Figure |22], 
along with how much each variable can improve SIC over the better of the two. 

Next, we consider the top 10 pairs from the complete set of 800 variables. To get these, 
we would ideally just take every possible pairing and evaluate the combined SIC. This is 
marginally possible with pairs, but too computationally intensive for triplets or larger com- 
binations. Fortuitously, after trying the all possible pairs, we observed that nearly identical 
results follow from simply taking the top 3 single variables and combining pairwise with a 
reduced set of 200 of the original variables. This reduced set was selected to contain not more 
than 99% correlated information. We combine the pairs with the BDT method and extract 
the value of SIC for each combination. The significance improvement curves for top 10 pairs 
are shown in figure 21. Note that most of the pairs involve variables not in the original top 
10. 

This way of building up combinations can be iterated. The algorithm is 



1. Start with the top 3 sets of n variables. 
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Figure 22: Two variable improvement percentage of the top 10 individual variables. The left shows 
the values of SIC for the pairs (as a percentage), with the single variables' scores on the diagonal. The 
right shows how much the combination improves over the better of the pair: {Rij — max(i?;, Rj))-100%. 



2. Add each of the original 200 variables to the set and compute the maximal significance 
improvement. 

3. Take the best 3 sets of n + 1 variables for the next iteration 

Using this algorithm, we find the the top set of 10 variables for the LHC ZH sample is 

l^ 1 ! _l_ l^ 2 l \^! H \ l^2| I -*Z\ \J>'2\ pobj. ,-iprim. 

\Pt\ + \Pt\> \Pt\-\Pt\> \Pt\-\Pt\i m H,bl, % is , ^ vis , 



pulkx, pull/3, A" ' 9 , girth 62 (6.2) 

Here "obj." and "prim." refer to whether these energy variables were constructed from recon- 
structed objects in the entire event or just from the primary objects (two leptons and two 



6-jets), as discussed in Section 3.8. ^4 fel ' 9 is the jet angularity constructed from the hardest 



6-jet with a = —0.9. This set of variables is much less correlated than the top 10 individual 



variables. This is shown in Figure 23, which can be compared to Figure 2C. Also, the set of 
top 10 combined variables, in contrast to the top 10 individual variables, includes a number 
of the showered and event shape discriminants. 

We often see convergence towards a final significance when many variables are combined. 
We show in Figure the significance curves for the top 3 sets of 3, 4, 5, • • • ,11 variables. 
We see convergence at around 8 variables. The results for the WH sample as compared to 
its irreducible background, and the Tevatron versions are also shown. The associated es/eb 
curves contain equivalent information. But since they all blow up at small Es, they are much 
more difficult to interpret, and therefore not shown. 
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Figure 23: Linear correlation coefficients of signal and background (LHC ZH sample) for 10 variables 
which are best when combined, as determined by the algorithm defined in the text. 



Note that the significance curves for ZH become poorly estimated at low e$- This is 
entirely due to lower statistics due to harder cuts. In fact, these low significances correspond 
to Es ~ 1CT 1 and eb ~ 10 -3 . It is natural to expect that the multivariate methods will 
struggle in trying to characterize an 11 dimensional space to one part in 10 3 . We find no such 
instabilities for the WH sample, since the background efficiencies are in general higher for 
a given signal efficiency. This points to one advantage of the differential SIC visualization 
technique - it demonstrates when the statistics of the Monte Carlo sample is becoming a 
problem. Indeed, as we were generating the samples, we noted much instability with smaller 
statistics, which led us to increase the size of the runs. However, even with 1 million back- 
ground events, eb ~ 10~ 3 means that only a few thousand events are controlling the final 
efficiency. Efficiencies around 10 -3 seem to be a practical limitation on this method. To 
go further, one should put more judicious cuts on the initial sample so that the tails of the 
distributions will be more accurately populated. 

7. Optimizing Jet Reconstruction 

Next, we return to the issue of optimizing the jet size and the m bb discriminant. For the 
previous sections, we have fixed the jet size to anti-Zc-r with R = 0.5 and imposed a fixed 
window of 90GeV < m bb < 124 GeV. This section justifies that choice, and describes some 
more sophisticated options for how to treat m bb . Ideally, one would like to see a peak in the m bb 
distribution by eye to claim a really satisfying Higgs boson discovery. From a mathematical 
point of view, statistical significance is much more important than the aesthetic shape of 
a curve. Thus, a more powerful approach is, rather than imposing a fixed m bb window, to 
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Figure 24: Significance improvement characteristics for top combinations of 1 ... 11 variables for ZH 
and WH at the LHC and Tevatron. Each color corresponds to a different number of variables. For a 
fixed number, the solid line corresponds to the best set, dashed to the 2nd best, and dotted to the 3rd 
best, all of which go on to the next round. These simples all have 90 GeV < m b i < 124 GeV, which, 
with no additional discriminants, defines the reference value Eb = £s = SIC = 1. 



include discriminant in the multivariate analysis. This will draw out correlations 

between m bb and the other variables. Moreover, as we will see, including even multiple m bb 
measures does much better. 

To begin, we return to our generator-level sample, still requiring two 6-tagged jets, but 
not imposing an m bb mass constraint. We first explain how an optimal window is calculated. 
Figure 25 shows, on the left, the m bb distribution for signal and background. For each possible 



upper and lower edge of the mass window, one can calculate eg and eb for that window. The 
distribution of SIC = £s/\/^B is shown on the right. The optimal window is defined to be 
the one that maximizes SIC, which for this figure, anti-fcj 1 R = 0.5 jets for ZH at the LHC, is 
90 GeV < m bb < 124 GeV. This is the window we have been using in previous sections, and 
the window that denned the reference efficiencies (SIC = 1). 
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Figure 25: On the left is a representative Higgs Invariant Mass m hb distribution for anti-fcx 2? = 0.5 
jets for ZH signal (solid blue) and Zbb background (hashed red) at the LHC with selection cuts. On 
the right, the significance improvement can be directly calculated for any two-sided cut. The maximum 
here gives our Higgs mass- widow constraint of 90 GeV < m h i < 124 GeV. 




Figure 26: Significance improvement characteristics varying only the m b i window for different jet 
types. SICs are less than 1 because the reference value is defined using the optimal jet algorithm: 
anti-fey with R = 0.5 with optimal mass cut 90 GeV < m hb < 124 GeV. The right panel shows a 
zoom-in of the peak region in the left panel. 



Figure shows the SIC curves for three jet algorithms: for pl| , anti-rVr (l5| and Cam- 
bridge/Aachen (C/A) |29| and for jet sizes from 0.4 to 0.7. The best anti-^T jets (solid lines) 
beat out the best Cambridge/ Aachen (dotted) and hr (dashed), but they are all quite close. 
The optimal seems is anti-rVr R=0.5 for the LHC and anti-^T 22=0.7 for the Tevatron, but 
the response of real calorimeters might change this. The SIC curves are very similar for ZH 
and WH at both machines. 
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Figure 27: 2D histograms of m 6 g for mild and aggressively trimmed jets show their correlation. 
Starting with anti-fey R = 0.5 jets, aggressive trimming means keeping only 0.2 anti-fey subjets whose 
Pt is more than 50% of the original jet py; mild trimming means keeping 0.05 fey subjets with more 
than 1% of the original jet's pr- After the each jet is trimmed, the invariant mass of the pair is 
calculated. 



Next, we consider multiple mass measures. This idea was inspired by the work of Soper 
and Spannowski in Ref. |^0|. They considered the effect of combining pruning and trimming 
for the highly boosted ZH events, and found that the two were somewhat complimentary. 
Pruning [31] attempts to find evidence for a heavy particle decaying to boosted collimated 
hadronic jets, while trimming || is designed to remove contamination from initial-state radia- 
tion and the underlying event. Both trimming and pruning were modifications of the filtering 
procedure used for boosted ZH search |3^] and for top-tagging 33]. For a review, see [34] 
or [32]. Since our 6's do not appear in a single fat jet, we restricted our consideration to 
the trimming algorithm. We wanted to see whether combing mass measures with different 
amounts of trimming could improve over a single jet type. 

The jet trimming starts with a jet, say one of our original anti-fey i? = 0.5 6-jets. Then 
one reclusters the jet with a smaller jet size r, say r = 0.1. If the energy of any of the 
smaller jets is less than a fraction / of the original jets energy, the subjet is tossed. Then 
the remaining subjets are recombined into a trimmed jet by adding their 4-momenta. This 
procedure naturally removes soft radiation, representative of underlying event contamination 
or soft ISR, while keeping the hard collinear radiation from the final state shower. Trimming 
has two parameters r and /, along with the jet algorithm used to form the subjets. 

First, we looked at a single m b i measure with trimming on both of the 6-jets. We did 
not find that any values of the parameters were significantly better than no trimming. On 
the other hand, we found that significant improvement did result from combining different 
trimmed masses. We define two extreme trimmings: mild trimming, with r = 0.05 and f=l% 
and aggressive trimming, with r = 0.2 and / = 50%. The two dimensional distribution of 
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Figure 28: Significance improvement Ss/^/es with various combinations of m b i constructed from 
trimmed and untrimmed jets. 



m b i with mild and aggressive trimming of both 6-jets is shown in Figure 27. One can see that 
m b i for the the mild and aggressive trimmed jets are strongly correlated for the background, 
as evidenced by the long flat direction, but much less correlated in the signal. By eye, one 
can see that drawing a contour to separate signal from background will do better than any 
single line or rectangular window. 

Figure ^8] shows the significance improvement for various combinations of mild, aggres- 
sive, and no trimming. Note that while mild trimming by itself is almost identical to no 
trimming, when combined with aggressive trimming, mild does better than not trimming at 
all. Combining all three does not improve over mild + aggressive alone. 

After concluding that multiple mass measures can improve the significance, we then 
included the mass measures with the other discriminants in the multivariate analysis. We 
found that for just kinematic variables, having multiple trimmed masses does help a little. 
However, when the radiation variables are included, the multiple mass measures have no 
effect on the SIC curves. This seems to hold for the ZH and WH samples at the Tevatron or 
the LHC. We saw no effect either by adding the mass measures from mild and aggressively 
trimmed jets on top of the top 10 variables or by incorporating about 100 m hb measures 
directly into the multivariate mix. Thus, it seems that while multiple trimmings by themselves 
are useful, they share much information with many of the showered variables. 
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Table 4: Top variables that showed up at the stages indicated. First number is the is first stage they 
appeared, second number is the last stage they appeared. Variables that ended up in the top 10 are 
indicated with a 10. "obj." and "prim." refer to whether reconstructed objects or the primary four 
objects (2 b-jets and 2 leptons) were used, "angul." are the angularities. 



8. Final results 



Our final results are shown in Figure ||. The top two panels show the significance improvement 
characteristics for the Tevatron, and the bottom two panels for the LHC. Our variables are 
listed in Table |], and combined using BDT with 3000 trees. For the Tevatron results, we can 
get a sense of how much better we do by comparing to sets of variables used by CDF and DO. 
The SIC curves for our implementation of a subset 1 of their variables, as listed in Table ||, 
are also shown in Figure ||. We see that around a 10-20% improvement against irreducible 
backgrounds is possible. There are a few important caveats associated with this conclusion: 
we are only considering irreducible backgrounds, we base our study entirely on particle- level 
Monte Carlo, we have not considered experimental or theoretical systematic uncertainties, 



x We include all the variables used by these groups, except for the ones which depend on missing energy. 
The missing energy variables are mostly useful in the WH for removing the ti background case, which we are 
not considering here. See also Section 3.10. 
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Table 5: Subsets of variables used in recent CDF ]l| and D0 [|| analyses which we use to compare 
our efficiencies. 



and we do not know if all of these variables can even be measured. Nevertheless, our results 
do imply that future study is warranted with potentially significant gains. 

In the bottom two panels, we present our results for the LHC. While we are not aware of 
any published multivariate approach to this channel by either ATLAS or CMS, as a reference, 
we take the ATLAS search which used the high pt sample Q. Comparing the effectiveness 
of the pt to the multivariate approach, we can see that the multivariate approach is clearly 
superior. Again, although one cannot translate this directly an improvement in sensitivity, it 
is reasonable to expect that since the boosted searches with pff > 200 GeV has greater than 
3ct discovery potential with 30 fb _1 , a multivariate approach including the lower pt events 
would be at least as strong. Part of the motivation of the hard pt cut was to kill WH^s ti 
background, an issue we have not addressed. Nevertheless, one can conclude from our analysis 
that a proper more complete multivariate study of the light Higgs boson discovery potential 
in W/Z + H is worth pursuing. 

The list of top variables is given in Table |j. The first few variables tend to be pt and 
rapidity differences, then other angular variables like twist or global variables like centrality 
or Ht- Starting around the 4th variable, pull consistently proves useful, followed by other 
jet properties like girth, angularities, or subjet average Pt's. Since our particle-level analysis 
does not take into account detector effects, experimental jet calibration, or missing energy, 
the best variables to be used experimentally will likely differ from those in this list. 
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Figure 29: SIC curves for sets of variables discussed in the text, along with the appropriate subset of 
the variables used by CDF and D0, as listed in Table 5. All curves include m bb as one of the variables, 
as opposed to previous Figures which were with a fixed m bb window. The reference SIC of 1 is still 
with respect to the hard cut 90 GeV < m bb < 124 GeV. Numbers refer to number of variables going 
into the BDT (e.g. 11= top 10 + 1 for m bb ). Curves labeled kinematic have the radiation variables 
(pull, girth, angularities) removed. 



9. Conclusions 



We explored Higgs boson production in association with a Z or W boson, with H — > bb. This 
served as a case study in optimizing multivariate discriminants. We attempted to systemat- 
ically consider an enormous number of discriminants, including kinematic variables natural 
from an experimental point of view (e.g. azimuthal angle differences), variables natural from 
a physical point of view (e.g. helicity angle of the Higgs boson decay products), variables 
dependent on final state radiation (e.g. pull), and even unmotivated kinematic variables (e.g. 
scalar sum of the pt of the harder 6-jet and the softer lepton from the W/Z decay). Some of 
the variables we introduced, such as azilicity and twist should be more generally useful. 
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We employed a simple algorithm to combine the variables systematically: take the top 
single variables, and combine pairwise with every other variable. Then take the top pairs 
and combine them with every other variable, and so on. We found convergence at around 8 
or 9 variables, with the 5th or 6th variable still adding substantial significance improvement. 
Interestingly, we found that many 'naive' choices of 8 good variables differ substantially from 
the optimal choice. Some variables, such as pull, are not very useful by themselves, but 
make a strong appearance as the 4 th or 5 th variable, leading to a relative 20% or so S/y/B 
improvement. It is hard to develop intuition for some of the more obscure variables, so 
the use of this algorithmic procedure provides a great advantage. If one of many obscure 
variables proves powerful, it can become a focal point for new theoretical understanding and 
experimental study. 

In order to see the marginal significance enhancements, we observed that the signifi- 
cance improvement SIC = ss/ ' ■ s [eb viewed as a function of the signal efficiency Es provides 
a very powerful visualization technique. The significance improvement curves consistently 
demonstrate at least three things: 

1. They show how the improvements converge when more variables are added. We found 
stability at around 8 or 9. 

2. They manifest instabilities when the Monte Carlo samples are inadequate. For example, 
with around one million input events, the instabilities begin when £5 or eb are of order 
1CT 3 or less. 

3. The curves have maxima at finite values of £5 (in contrast to es/^b, which generally 
diverges at small £5). The SIC at this value, SIC, provides a quantitative measure of 
the improvement of S/ \f~B when the corresponding discriminant is used. 

In contrast, the ROC curves, which show eb as a function of £5, and es/sb curves, are harder 
to interpret in terms of search optimization, even though they contain the same numerical 
information as the SIC curves. Nevertheless, since S/B is in fact relevant to the final analysis, 
one might not want to choose £5 exactly to optimize SIC, but rather to choose a somewhat 
lower value, which increases S/B at a small cost to SIC. This allows for the actual significance 
measure, with systematic efficiencies included, to be maximized. 

We also considered various ways to measure the bb invariant mass, which should have a 
peak near mu for the signal sample. We found that the best choice of jet algorithm and size, 
out of kx, anti-fey and Cambridge/ Aachen is anti-A>r with R = 0.5 at the LHC and anti-fcy with 
R = 0.7 at the Tevatron, although the algorithm and jet size dependence is not very strong. 
We also found that using trimmed jets to construct a single m bb does not seem to help much. 
It seems that optimizing the jet size can compensate for the effect of trimming. On the other 
hand, we found that trimming gives us an important new handle when multiple jet measures 
can be combined. We found that combining m bb from aggressively trimmed jets and m bb 
from mildly trimmed jets does uniformly better than a single trimming or no trimming at all. 
This was understood qualitatively from the 2D m bb distribution. Nevertheless, even multiple 
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trimmings seem not to provide marginal improvement when combined with other variables 
sensitive to radiation in the event. This is important observation because, at this point, 
trimming has never been attempted experimentally. If it turns out the same information is 
contained in, say, pull, which has already been shown to be measurable, one may be able 
to avoid dealing with trimmed jets. On the other hand, if it turns out that the theoretical 
uncertainties on trimmed jet masses are unusually small, or trimmed jets are less sensitive 
to pileup, then it may be worth using trimming instead of variables like angularities, which 
are expected to degrade faster in busy events. In any case, understanding the relationship 
between these many new discriminants should be a fruitful area for further investigation. 

We only considered the irreducible W/Z + bb backgrounds to W/Z + H production. This 
is the main reason we cannot translate our results into a final significance estimate. Indeed, 
the reducible tt background is particularly important for WH. and the W/Z + jj background 
with false positive b-tags is important for both. We believe that the tt background can be 
easily tamed with a jet veto or a more sophisticated top anti-tag, and the jj background can 
be studied in the same way that the we have studied the bb background here. The 6-tagging 
quality could also be input as an additional variable to be optimized, rather than being fixed 
at 60% efficiency. 

Combining multiple discriminants, we compared the significance enhancement charac- 
teristic to that coming from the set of variables used by CDF and D0. We found around 
a 10-20% enhancement was in principle possible. Most of the variables used by CDF and 
D0 are kinematical, taking advantage of distinctions which are apparent in the distribution 
of the initial hard partons. Since we include many variables, such as pull, which are absent at 
the hard parton level, it is understandable that some improvement should result. However, 
our conclusion is only that enhancement may be possible. The same qualitative conclusion 
holds for the LHC light Higgs boson search. We find that our variables work even better for 
ZH at the LHC, partially because initial states for the background are dominantly gg in a 
pp collider while the signal is qq initiated at either machine. We believe that while putting a 
hard cut on p^, reduces the problem to something more manageable, the Higgs boson can be 
found without a fixed restriction on pr, with potentially much larger significance. Moreover, 
at the LHC, since the detectors have better resolution, the variables related to radiation pat- 
terns and jet substructure may be more accurately measured. Both the conclusions about 
the Tevatron and the LHC come with a few caveats: we have only compared to irreducible 
backgrounds; we have not considered experimental or theoretical systematic uncertainty; and 
we have not considered the accuracy to which all our variables can actually be measured. 

In this paper, we have demonstrated that it is possible to perform comprehensive mul- 
tivariate analysis using Monte Carlo simulations. We have argued that SIC curves provide 
a useful visualization, and that there is room for additional discovery potential for a light 
Higgs boson at both the Tevatron and the LHC. Despite this intensive effort, there remain a 
number of important questions which we were forced to defer to future work. It is important 
to verify that many of the useful discriminants are stable with regard to different models of 
the parton shower as was done for pull in § , and that the set of important variables is roughly 
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generator independent. It is also generally useful to have a better sense of how the Boosted 
Decision Tree parameters should be chosen, and in which situations other methods would be 
preferable. From the physics side, we believe that the dominant reducible tt background can 
be removed using either a jet veto or, more importantly, a multi-variable discriminant similar 
to the ones we have developed here. If that and the W/Z + jj sample can be characterized, 
we could produce a more realistic significance estimate for the Higgs boson discovery reach. 
Such a study should properly be done with fully reconstructed events. However, there is still 
work which can be done on the theoretical side. 

In summary, we have constructed a framework for evaluation and optimization of multi- 
variate searches. This can form the basis for future studies in important but difficult searches 
at the LHC and Tevatron. 

Acknowledgements 

Discussions with Michael Kirby illuminated much about the current D0 multivariate Higgs 
boson search. This work was supported in part by the Department of Energy under grant DE- 
SC003916. The computations in this paper were performed on the Odyssey cluster supported 
by the FAS Research Computing Group at Harvard University. 

References 

[1] http: //www-cdf .f nal .gov/physics/new/hdg//Results_files/results/whlnubb_jul 10/ 
http: //www-cdf .f nal .gov/physics/new/hdg//Results_f iles/results/zhllbb_jullO/ 
comb/zhllbb_comb_web/ 

[2] http : //www-dO . f nal . gov/Run2Physics/WWW/results/prelim/Higgs/H95/ 
http : //www-dO . f nal . gov/Run2Physics/WWW/results/prelim/Higgs/H92/ 

[3] V. M. Abazov et al. [DO Collaboration], arXiv:1008.3564 [hep-ex]. 

[4] Atlas Technical Design Report "Higgs Searches" 1999 http://www.cern.ch/Atlas/GROUPS/ 
PHYSICS/TDR/physics_tdr /printout /Volume_II/letter/Higgs_searches_letter .ps .gz 

[5] J. M. Buttcrworth, A. R. Davison, M. Rubin and G. P. Salam, Phys. Rev. Lett. 100, 242001 
(2008) [arXiv:0802.2470 [hep-ph]]. 

[6] ATL-PHYS-PUB-2009-088. ATL-COM-PHYS-2009-345. 

[7] CDF note 10235 "A Search for the Standard Model Higgs Boson in the Process ZH -> l+^bb 
Usnig 5.7 fb-1 of CDF II Data" (July 16, 2010) 

[8] D. Krohn, J. Thaler and L. T. Wang, JHEP 1002, 084 (2010) [arXiv:0912.1342 [hcp-ph]]. 

[9] J. Gallicchio and M. D. Schwartz, Phys. Rev. Lett. 105, 022001 (2010) [arXiv:1001.5027 
[hep-ph]]. 

[10] R. K. Ellis, W. J. Stirling, B. R. Webber, "QCD and collider physics," Camb. Monogr. Part. 
Phys. Nucl. Phys. Cosmol. 8, 1-435 (1996). 

[11] J. Alwall et al, JHEP 0709, 028 (2007) [arXiv:0706.2334 [hep-ph]]. 



-40- 



[12] T. Sjostrand, S. Mrenna and P. Z. Skands, Comput. Phys. Commun. 178, 852 (2008) 
[arXiv:0710.3820 [hep-ph]]. 

[13] M. Cacciari and G. P. Salam, Phys. Lett. B 641, 57 (2006) [arXiv:hep-ph/0512210]. 

[14] A. Djouadi, Phys. Rept. 457, 1-216 (2008). [hep-ph/0503172]. 

[15] M. Cacciari, G. P. Salam and G. Soycz, JHEP 0804, 063 (2008) [arXiv:0802.1189 [hep-ph]]. 

[16] A. Hoecker et al, TMVA Toolkit for Multivariate Data Analysis with ROOT, 
http : / /tmva . sourcef orge . net/. 

[17] R. Brun and F. Rademakers, ROOT - An Object Oriented Data Analysis Framework, 

Proceedings AIHENP'96 Workshop, Lausanne, Sep. 1996, Nucl. Inst. & Meth. in Phys. Res. A 
389 (1997) 81-86. See also http://root.cern.ch/. 

[18] M. Bahr et al, Eur. Phys. J. C 58, 639 (2008) [arXiv:0803.0883 [hep-ph]]. 

[19] T. Chwalek [CDF and DO Collaboration], arXiv:0705.2966 [hep-ex]. 

[20] L. G. Almeida, S. J. Lee, G. Perez, G. F. Sterman, I. Sung and J. Virzi, Phys. Rev. D 79, 
074017 (2009) [arXiv:0807.0234 [hep-ph]]. 

[21] C. F. Berger, T. Kucs and G. F. Sterman, Phys. Rev. D 68, 014012 (2003) 
[arXiv:hep-ph/0303051]. 

[22] S. D. Ellis, C. K. Vermilion, J. R. Walsh, A. Hornig and C. Lee, arXiv:1001.0014 [hep-ph]. 

[23] [DO Collaboration], DONote 6087-CONF, "Search for the standard model Higgs boson in the 
ZH->bbvv channel in 6.4 fb-1 of ppbar collisions at sqrt(s)=1.96 TeV", Preliminary Results for 
Summer 2010 Conferences, 

http://www-dO.fnal.gov/Run2Physics/WWW/rcsults/prelim/Higgs/H90/ , August 2010. 
[24] J.D. Bjorken and S.J. Brodsky, Phys. Rev. Dl (1970) 1416 

[25] Y. Freund and R. E. Schapire, Experiments with a new boosting algorithm, Proc COLT, 
209aA§217. ACM Press, New York (1996). 

[26] B. P. Roc, H. J. Yang, J. Zhu, Y. Liu, I. Stancu and G. McGregor, Nucl. Instrum. Meth. A 543, 
577 (2005) [arXiv:physics/0408124]. 

[27] B. P. Roe, H. J. Yang and J. Zhu, Prepared for PHYSTAT05: Statistical Problems in Particle 
Physics, Astrophysics and Cosmology, Oxford, England, United Kingdom, 12-15 Sep 2005 

[28] S. Catani, Y. L. Dokshitzcr, M. H. Seymour and B. R. Webber, Nucl. Phys. B 406, 187 (1993). 

[29] G. P. Salam and G. Soyez, JHEP 0705, 086 (2007) [arXiv:0704.0292 [hep-ph]]. 

[30] D. E. Soper and M. Spannowsky, JHEP 1008, 029 (2010) [arXiv:1005.0417 [hep-ph]]. 

[31] S. D. Ellis, C. K. Vermilion and J. R. Walsh, Phys. Rev. D 81, 094023 (2010) [arXiv:0912.0033 
[hep-ph]]. 

[32] J. M. Buttcrworth et al, arXiv:1003.1643 [hep-ph]. 

[33] D. E. Kaplan, K. Rehermann, M. D. Schwartz and B. Tweedie, Phys. Rev. Lett. 101, 142001 
(2008) [arXiv:0806.0848 [hep-ph]]. 

[34] G. P. Salam, Eur. Phys. J. C 67, 637 (2010) [arXiv:0906.1833 [hep-ph]]. 



- 41 - 



